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ABSTRACT 

We introduce a new photometric estimator of the Hi mass fraction in local 

galaxies, which is a linear combination of four parameters: stellar mass, stellar surface 
(-H \ mass density , NUV — r colour, and g — i colour gradient. It is calibrated using samples 

of nearby galaxies (0.025 < z < 0.05) with Hi line detections from the GASS and 
ALFALFA surveys, and it is demonstrated to provide unbiased AIhi/M^, estimates 
even for Hi-rich galaxies. We apply this estimator to a sample of ~24,000 galaxies 
from the SDSS/DR7 in the same redshift range. We then bin these galaxies by stellar 
mass and Hi mass fraction and compute projected two point cross-correlation functions 
with respect to a reference galaxy sample. Results are compared with predictions from 
current semi-analytic models of galaxy formation. The agreement is good for galaxies 
with stellar masses larger than lO^^M©, but not for lower mass systems. 

We then extend the analysis by studying the bias of Hi-poor or Hl-rich galaxies 
with respect to galaxies with normal Hi content on scales between 100 kpc and ^ 5 
Mpc. For the Hi-poor population, the strongest bias effects arise when the Hi-deficiency 
is defined in comparison to galaxies of the same stellar mass and size. This is not 
reproduced by the semi-analytic models, where the quenching of star formation in 
satellites occurs by "starvation" and does not depend on their internal structure. Hi- 
rich galaxies with masses greater than are found to be anti-biased compared 

to galaxies with "normal" Hi content. Interestingly, no such effect is found for lower 
mass galaxies. 

Key words: galaxies: clusters: general ~ galaxies: distances and redshifts - cosmology: 
theory - dark matter - large-scale structure of Universe. 



1 INTRODUCTION 

Over the past decade, large optical spectroscopic sur- 
veys such as the 2dF Galaxy Redshift Survey (2dFGRS; 
Colless et al1l200ll ) and the Sloan Digital Sky Survey (SDSS; 
York et all |2000| ) have led to a resurgence in studies of 
the large scale structure of the Universe traced by galax- 
ies. There are two main applications of such studies: a) to 
constrain cosmological parameters such as the matter den- 
sity of the Universe fi™, Hubble parameter h, fluctuation 
amplitude erg and neutrino mass in conjunction with con- 
straints from other experiments, such as cosmic microwave 
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background (CMB) or Lyman a forest m e asurements (e.g. 
Spcrgel et ai] l2003l : iTegmark et al. 12004 lEisenstein et al.l 
20051), b) to constrain models for the formation and evo- 
lution of the galaxy population. 

Traditionally, large-scale structure studies that focus on 
cosmological applications aim to measure the clustering sig- 
nal on large scales (tens of Mpc or greater). On large scales, 
the clustering amplitude depends only on the mass of the 
dark matter halos that host the galaxies. All galaxies, re- 
gardless of mass or type, trace the underlying dark matter 
density field in a simple linear fashion, so constraints on 
cosmological parameters are believed to be robust. 

In contrast, studies aimed at constraining galaxy for- 
mation focus on the clustering signal on scales less than ~ 5 
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Mpc. On these scales, the clustering amplitude depends on 
not only the mass of the dark matter halos in which galaxies 
are fo und, but also the location of galaxies within their host 
halos (|Benson et al.ll2000l : IPeacock fc Smithll2000l 'l. 

In the current paradigm of galaxy formation within a 
merging hierarchy of dark matter halos, galaxies form when 
gas is able to cool, condense and form stars at the centres 
of dark matter haloes. At a later stage, the galaxy may 
be accreted into a larger dark matter halo and become a 
satellite galaxy in a group or a cluster. Gas is no longer 
supplied to these galaxies and star formation subsequently 
shuts down ove r some timescale (|Kauffmann et all 1 19931 : 
ICole et al.l [T994I ). In more recent models, gas is no longer 
supplied to central galaxies with central super-massive black 
holes locat ed in dark m atter halos containing a hot gas at- 
mosphere jCroton et al. 2006; Bow er ct al. 20i)^). 

One important goal in modern galaxy formation is to 
understand the physics behind these gas-related "accretion" 
and "quenching" processes in detail, because the timescales 
over which they operate and the way in which their efficien- 
cies scale with halo and/or galaxy mass will determine how 
the galaxy population as a whole evolves as a function of 
cosmic epoch. Clustering analysis is a powerful tool in this 
endeavour. In particular, analysis of the cross-correlation 
between a specific galaxy sub-population and a larger "ref- 
erence" sample allows one to maximize the S/N of the clus- 
tering measurement when the size of the sub-sample is small. 
This technique has recently been applied to sub-samples of 
narrow-line galaxies with actively accreting black holes in 
the SDSS to demonstrate that these galaxies are not trig- 
gered by mergers and are fo und preferential l y at t he centres 
of their dark matter haloes (|Li et al.ll2006bl . I2OO8I '). 

The clustering of galaxies as a function of their neutral 
gas content should in principle yield very interesting con- 
straints on gas accretion and quenching proc es ses in galax- 
ies (e. g. iPopping et"aLll2009l : iKim et al.|[201ll ). iMever et all 
l|2007l ) determined the two-point autocorrelation function 
(2PCF) of Hl-rich galaxies using 4 315 galaxies from t he Hi 
Parkes All Sky Survey (HIPASS; IZwaan et af] |2005| ) and 
found that Hi selected galaxies exhibit weaker clustering 
than o ptically selected galaxies of the same luminosity. Re- 
cently iPassmoor et al.l (|201ll ) measured the 2PCF for an 
early release of the Ar ecibo Legacy Fast ALFA (ALFALFA; 
Giovanelli et al.l 120051 ') sample, finding similar results. The 



Mever et al.l (120071 ) study also looked at the dependence of 



clustering on total Hi mass, finding it to be weaker than the 
dependence on both luminosity and on rotation velocity. 

Up to now, there has been no attempt to study how clus- 
tering depends on Hi mass fraction (i.e Mhi /Af, ) , a quantity 
that ought to be much more directly related to accretion and 
quenching processes that affect the gas content of a galaxy, 
but not its stellar mass. In addition, a power-law form for 
the correlation function has been assumed in these previous 
clustering analyses, which means that information about lo- 
cation of the galaxies within their dark matter haloes (al- 
ternatively central or satellite galaxy fraction), cannot be 
recovered. Finally, because available samples are small, it 
has not been possible to study clustering as a function of Hi 
mass fraction in conjunction with other galaxy parameters, 
such as stellar mass or stellar surface mass density. In this 
work we will demonstrate how an approach that combines 
Hi data for a small, but complete sample of 1000 galaxies 



and optical data for a much lar ger sample of galaxies from 
the SDSS Data Release 7 (DR7 nAbazaiian et alll2009l ) can 
be used to study the influence of dark matter halo mass and 
environment on the gaseous properties of galaxies. 

The GALEX Arecibo SDSS Survey (GASS; 
ICatinella et al.ll2O10D is measuring the atomic gas content of 
a sample of ~1000 galaxies with redshifts and stellar masses 
in the ranges 0.025 <z< 0.05 and lO" < Af. < 10^^'' Mq. 
Each galaxy is observed until the Hi line is detected or 
until an upper limit of ~ 0.015 in the atomic-to-stellar 
mass ratio is reached. The GASS galaxies are selected from 
the SDSS spectroscopic and Galaxy Evolution Explorer 
(GALEX: [Martin et al.ll2005l ). so stellar masses, sizes and 
structural parameters are available from the MPA/JHU 
data base fhttp:/ /www. mpa-garching.mpg.de/SDSS/). 
The scaling relations of the Hi mass fraction of the GASS 
galaxies (Mni/Mt), as a function of global galaxy pa- 
rameters such as stellar mass Af», surface mass density 
/i,, light concentration index C (defined as Rgo/RbO, the 
ratio of the radii enclosing 90 and 50 percent of the total 
r-band light) a nd specific star forma tion rate SFR/M, 
are presented in Catinella et al.l (|2010l . hereafter CIO) and 
ISchiminovich et al.l (|2010l) . 

Following the work of lZhang et"al] (|2009l ). CIO defined a 
gas-fraction "plane" linking Hi mass fraction, stellar surface 
mass density and NUV — r colour that exhibited a scat- 
ter of 0.315 dex in logj^Q Adni/M,, considerably tighter that 
the relation betw een Hi mass f ractio n and optical/near-IR 
colour studied by iKannapparil (|2004 ). which had a scatter 
of ~ 0.4 dex. The improvement in scatter indicates that the 
Hi content of a galaxy scales with its physical si ze as well as 
with i ts star formation rate. In subsequent work. lWang et al] 
(|201lh showed that at fixed NUV — r colour and stellar sur- 
face density, galaxies with larger Hi gas fractions have bluer 
outer disks. 

In this paper, we include the colour gradient of galaxies 
as an additional parameter in our fits. This produces a re- 
lation with similar scatter, but that better predicts the Hi 
mass fraction of the most gas-rich galaxies in our samples. 
We use this relation to predict the Hi content of the galaxies 
in our SDSS/DR7 sample. We then study how clustering de- 
pends on both "pseudo" Hi mass fraction, and a "pseudo" Hi 
excess/deficiency parameter, which we define as the devia- 
tion in the predicted Hi content of a galaxy from the average 
Hi content of all galaxies of the same stellar mass and surface 
mass density. This "pseudo" Hi excess/deficiency parameter 
depends on a combination of NUV — r colour and g — i colour 
gradient. Finally, we compare our results with clustering pre- 
dictions f rom t he semi-analytic gala xy formati o n mo dels of 
IFu et all (|2010l . hereafter FIO) and IGuo et all (|201ll , here- 
after Gil). The main way in which the FIO model differs 
from the Gil model is that it includes simple prescriptions 
for molecular gas formation processes. 

The motivation behind expressing the results in this pa- 
per in terms of "pseudo" Hi fraction, rather than in terms 
of directly measured photometric quantities, is because this 
provides insight into physical processes regulating the gas 
supply in galaxies. The semi-analytic models make a host of 
assumptions about how gas is accreted from the surround 
dark matter halo and then consumed into stars. By compar- 
ing clustering as a function of gas fraction in the models with 
the data, we hope to ascertain whether these assumptions 
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are correct, or whether there are discrepancies that warrant 
further investigation. 

Throughout this paper we have assumed a cosmology 
with = 0.3, A = 0.7 and Hq = 70 kms~^Mpc~^ when com- 
puting observed quantities. A Hubble constant of Ho = 100 
kms~^Mpc^^ is assumed when presenting correlation func- 
tion measurements. We note that the FIO and Gil models 
are based on simulations with Q — 0.25 and A = 0.75. This 
will make a small difference in the comparison between data 
and models. We note that the focus of this paper is not on 
obtaining precision fits to the data, but on identifying major 
discrepancies that may lead us to change the input physics 
in the model. 



2 DATA 

2.1 GASS galaxy sample 

The parent sample of GASS consists of 12,006 galaxies se- 
lected from_tlw_j|egi£n_o£_skj;_wl^^ sixth data release 
fDR6: lAdelman-McCarthv e1: al.|[2008l ') of the SDSS overlaps 
the maximal ALFALFA footprint. All galaxies are selected 
to have stellar masses Af« > 10^" and redshifts in the 
range 0.025 < z < 0.05. The GASS sample is constructed 
by randomly selecting a subset of ~1000 galaxies from the 
parent sample within the footprint of the GALEX Medium 
Imaging Survey so that the distribution in stellar mass is 
flat. The targets are observed with the Arecibo radio tele- 
scope until detected or until an Hi mass fraction Mni/Mt 
limit of 1.5-5 per cent is reached. In this work, we use the rep- 
resentative sample of 480 GASS galaxies, in cluding 293 de- 
tectio ns and 187 non-detections described in lCatinella et ahl 
(|20lj ). Details of the GASS survey design, target selection 
and observing procedures can be found in CIO. 



2.2 SDSS galaxy samples 

We have constructed two galaxy samples from the 
SDSS/DR7. 

The first sample, which will serve as our reference sam- 
ple in the clustering analysis, is a magnitude-limited sample 
of 66,461 galaxies with r < 17.6, -24 < Mo.i^ < -16 and 
redshifts in the range 0.025 < z < 0.05. Here, r is the r- 
band Petrosian apparent magnitude, corrected for Galactic 
extinction, and Mo.i^ is the r-band Petrosian absolute mag- 
nitude, corrected for evolution and if-corrected to its value 
at z = 0.1. These selection criteria, with the exception of 
the redshift range, are the same as in our previous papers 
where we studie d the clusterin g of galaxy luminosity and 
stellar mass fe.g. lLi et al]|2012l ). We have generated random 
samples that have the same sky coverage as well as the same 
position- and redshift-dependent selection effects as the ref- 
erence sample. Detail s of this proced ure are presented in our 
previous papers Ce.g. lLi et aTJl2006al ). 

The second sample contains 36,136 galaxies, and is a 
subset of the reference galaxies with stellar masses in the 
range 10^ '^ Mq < M« < lO^M©. In the next section we wiU 
estimate an Hi mass fraction for each galaxy in this sam- 
ple using our newly calibrated photometric estimator. We 
select a number of subsamples binned according to M« and 



Mni/Mt, and cross-correlate these with both the reference 
and the random samples. 

2.3 Physical properties of galaxies 

The physical quantities necessary for this work include stel- 
lar mass M, , stellar surface mass density /i, , NUV — r 
colour and colour gradient Ag_i. Stellar masses are derived 
from SDSS p hotom etry using the methodology described in 
ISahm et all (|2007l). These masses are publically available 
at http:/ /www.mpa-garching.mpg. de/SDSS/DR7f The stel- 
lar surface mass density is defined as /i« = M»/(27r7i|o,z)i 
where i?5o,z is the physical radius in units of kpc that con- 
tains half the total light in the z-band. The NUV magnitude 
is provided by the GALEX pipeline and the N UV — r colour 
is corrected for Galactic extinction following IWvder et al.l 
(.20071) with Anuv-t = 1.9807ylr, where Ar is the extinc- 
tion in r-band derived from the dust maps of lSchlegel et al.l 
(|l998h . The g — i colour gradient is defined as Ag_i = 
{g - i)out — {g — i)in, where [g — i)in and {g - i)out are the 
g — i colours in the inner and outer regions of the galaxy. 
The inner region is enclosed by i?5o,r, the radius containing 
half the r-band light. The outer region is defined as the area 
between Rso and -Rgo, the radius enclosing 90 per cent of 
the r-band light. A negative value of Ag-i implies that the 
outer region of the galaxy is bluer than the inner region. 



2.4 Semi-analytic model galaxy catalogues and 
mock SDSS samples 

In this paper we compare our observational results to pre- 
dictions from the galaxy formation models of Gil and FIO. 
Both models were created by implementing simple prescrip- 
tions for baryonic astrophysics on merger trees that follow 
the evoluti on of the halo/subha lo population in the Millen- 
nium (MS; ISpringel et aLll2005h Simulation, a cubic region 
500 /i~^Mpc on a side with mass resolution ~ 10^" M©. 

The Gil model is the most recent semi-analytic model 
from the Munich group, in which the treatments of super- 
nova feedback, galaxy size, photoionization suppression and 
environmental effects on satellite galaxies have been signif- 
icantly updated. Gil demonstrated that their model pro- 
vided excellent fits not only to the the luminosity and stellar 
mass functions of galaxies derived from SDSS data, but also 
to recent determinations of the abundance of faint satellite 
galaxies around the Milky Way. The clustering properties 
of galaxies as a function of stellar mass predicted by the 
model are in good agreement with SDSS data for masses 
above 6 x IO^^Mq and at separations larger than 2 Mpc. 
On smaller scales, lower mass galaxies are predicted to be 
substantially more clustered than observed. 

The FIO model is based on an earlier version of the 
M unich semi-analytic code, which is described in detail 
inlGroton et al.l (|2006l ) and updated in iDe Lucia fc BlaizotI 
(|2007l . hereafter DB07). The main new aspect of this model 
is that galactic discs are represented by a series of concentric 
rings in order to track the evolution in the geis and stellar 
surface density profiles of galaxies over cosmic time. In ad- 
dition, two simple prescriptions for molecular gas formation 
processes are included: on e is ba sed on the analytic calcu- 
lations bv lKrumholz et all (|2009l ). and one is a prescription 
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where the H2 fractio n is determined by the pr essure of the 
interstellar medium (|Blitz fc Rosolowskvll200a ). The model 
is currently being configured to operate on the latest code of 
Gil. The comparison in this paper will be restricted to the 
model publish ed in the FIO paper an d to the H2 formation 
prescription of lKrumholz et alJ (|2009l l. 

In this paper, the different treatments of gas stripping 
in the Gil and FIO models are of interest to us. In most 
SAMs including DB07, hot gas in a halo is assumed to be 
stripped immediately after the halo has been accreted on to 
a larger halo. In the Gil model this prescription has been 
modified. Satellite galaxies that still are attached to a sub- 
halo within the larger virialized "parent" halo are still able 
to accrete gas. This new treatment was motivated by ob- 
servational findings and hydrodynamical simulations which 
revealed that the hot atmosphere of massive satellite galax- 
ies may survive for a considerable time after accretion (see 
Gil and references therein for details). This change primar- 
ily affects satellite galaxies located in the outer regions of 
their host dark matter halos. The timescale for gas to be 
depleted and star formation to stop becomes significantly 
longer. 

We have constructed a set of 50 mock SDSS galaxy cata- 
logues from the Gil model using both the sky mask and the 
magnitude and redshift limits of our SDSS reference sam- 
pl e. Detailed descr iption of our method ology can be found 
in iLi et al.l (|2006br ) and iLi et al.l (|2007l ). These mock cat- 
alogues allow us to derive realistic error estimates for the 
statistics measured below, including both sampling and cos- 
mic variance uncertainties. 



3 ESTIMATING HI MASS FRACTIONS FOR 
THE SDSS GALAXIES 

Ther e have been a num ber of attempts to calibrate colours 
(e.g. iKannappanI 2004|) or emission-line equ i valent widths 
(e.g. iTremonti et al.l |2004| : lErb et all I2OO6I : iBouche et al] 
|2007D_ as proxies for th e gas-to-stellar mass ratio in galax- 
ies. Tzhang et all |200i) proposed a metho d motivated by 
the Kennicutt-Schmidt star formation law (|Schmidt|[l96a : 
lKennicut3 Il998l ) that combines colour and surface bright- 
ness to estimate the Hl-to-stellar mass ratio. They used a 
sample of 800 galaxies with Hi mass measu rements from the 
HyperLeda catalogue l|Paturel et al.|[2"003l ) and optical pho- 
tometry from the SDSS to calibrate a relation linking these 
quantities. In subsequent work, CIO used an unbiased sam- 
ple of galaxies with Hi measurements from GASS to show 
that Mhi/M^ can be well approximated by a linear com- 
bination of NUV-to-optical colour (NUV — r) and stellar 
surface mass density (fi,) with a la scatter of ~ 0.3 dex. 
However, as could be seen from Figure 12 of CIO, galaxies 
detected by the much shallower ALFALFA survey in red- 
shift range as the GASS sample had significantly higher Hi 
mass fractions, and were also systematically displaced from 
the CIO plane. This result would seem to imply that the 
Hi masses of the most gas-rich galaxies in the local Universe 
cannot be reliably inferred from t heir UV/opti c al pro perties. 

However, a recent study by IWang et al.l (|201ll ) focus- 
ing on the Hl-rich galaxies from the GASS and ALFALFA 
samples has revealed that unusually Hl-rich galaxies have 
bluer-than-average outer disks. Motivated by this finding. 
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Figure 1. Our different galaxy samples in the two dimensional 
plane of NUV-r colour and stellar surface mass density. The grey 
scale indicates the location of the 36,136 galaxies with 9.5 < 
logio(A^*/M0) < 11 from the SDSS DR7. The green points are 
GASS galaxies with Hi detections. The blue points arc ALFALFA- 
dctected galaxies in the same redshift range as the GASS sample 
(0.025 < 2 < 0.05). The squares indicate the grid centers of the 
Hi stacking analysis of an opticall y-selected, volume-li mited sam- 
ple of 5000 galaxies carried out bv lFabello et al] 1 I2OIII ). The cyan 
and red squares indicate the areas of the grid where a high sig- 
nificance measurement of the mean Hi mass fraction could be 
obtained from the stacked spectrum. The purple squares indicate 
the regions where only an upper limit could be derived. 



we now propose an updated photometric estimator, that in- 
cludes both stellar mass and the gradient in <; — i colour as 
additional parameters. 

In Figure [T] we plot some of the galaxy samples we 
will be working with in this paper in the two dimensional 
plane of NUV — r colour and stellar surface mass density. 
The grey scale indicates the location of the 36,136 galaxies 
with 9.5 < logio(Af./Af0) < 11 from the SDSS DR7. The 
green points are GASS galaxies with Hi detections. The blue 
points are ALFALFA-detected galaxies in the same redshift 
range as the GASS sample (0.025 < z < 0.05). Finally, the 
squares indicate the grid centers of the Hi stacking analy- 
sis of an optically-selec ted, volume- l i mited sample of 5000 
galaxies carried out by iFabello et all (|20lj ). The cyan and 
red squares indicate the areas of the grid where a high sig- 
nificance measurement of the mean Hi mass fraction could 
be obtained from the stacked spectrum. The purple squares 
indicate the regions where only an upper limit could be de- 
rived due to poor statistics. There are 3 grid centers located 
well within the "red sequence" with good mean Hi mass 
measurements and these provide a check on whether our Hi 
mass fraction estimators work well in regime where galaxies 
are gas-poor on average. 

As can be seen, the combination of the GASS and AL- 
FALFA data, as well as the stacked results, cover the region 
of NUV — r versus logj^Q /i, parameter space reasonably well. 
The GASS galaxies and stacked results are offset to some- 



© 2012 RAS, MNRAS 000.[Hfl4l 



Clustering with Hi 5 



-0.322lg^i,-0.234(NUV-r)+2.81 7 -0.322lg/i.-0.234(NUV-r)+2.81 7 
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Figure 2. In the left-hand panel wc plot in red open circles the best-fitting relation between the Hi mass fraction {Mhi /Mt) and the 
linear combination of surface mass density NUV — r colour, stellar mass Mt and the gradient \n g — i colour (Ag_i) determined from 
the Hl-detected galaxies in the GAS S survey. The re l ation is given in red on the bottom of the panel. This is compared to the grey dots 
which show the relation obtained bv lCatinella et all l|2010h between Mni/Mt and the linear combination of fit and NUV — r (given in 
grey on the top of the panel). Cyan and green diamonds are results for the stacked spectra (green diamonds are for the 3 stacks on the 
red sequence). The right-hand panel shows the same thing for a sample of Hl-rich galaxies from the ALFALFA survey. The solid and 
dashed linos in both panels indicate the 1 : 1 relation and Icr error region for the new estimator. 



what higher values of stellar surface mass density, because 
these samples are restricted to galaxies with stellar masses 
larger than 10^°Mq. 

Our new estimator is defined by 

logio Mni/Mt = a login + b{NUV - r) 

+clog^gMt/MQ + dAg-, + e (1) 

where Ag-i is the colour gradient defined in 12.31 The coef- 
ficients are determined by minimizing the residuals from the 
plane using the 293 Hi detections in the GASS sample. Fol- 
lowing CIO, when carrying out the fit, we weight each galaxy 
by the mass-dependent selection function of the GASS sur- 
vey. The 1(7 scatter in our new estimator is 0.31 dex, very 
similar to that of the old one. Figure [2] illustrates how this 
new estimator improves the Hi mass fraction estimates. 

In the left-hand panel of the figure, grey dots show the 
Hi plane of CIO for GASS galaxies, while coloured stars show 
the stacked results Q- In the right-hand panel of the figure, 
grey dots show the same CIO plane for a sample of 7000 
galaxies from the a AO catalogue of the ALFALFA survey 
iHavnes et al.|[201ll ) with stellar masses above IO^Mq and 
redshifts below 0.06. As can be seen, the majority of the 
grey points in the right panel lie above the relation. 

The Hi plane given by the new estimator in equation 
((J) is plotted in red open circles in both panels of Figure [S] 
There is rather little change for the majority of galaxies 



^ Note the bins on the red sequence have been coloured in green 
and lie very close to an extrapolation of the best-fit line through 
the other bins, indicating that the CIO plane still yields an accu- 
rate prediction of mean Hi mass fraction for galaxies on the red 
sequence 



in the GASS sample. However, the Hl-rich galaxies in the 
ALFALFA sample that were previously displaced to higher- 
than-predicted Hi mass fractions, are now mostly located 
well within the la region of the new relation. 

We note that this reduction in the systematic offset 
for Hl-rich galaxies could not be achieved by introducing 
a single new parameter into the fit (i.e only Ag-i). Equa- 
tion ([T} implies that the predicted gas fraction scales more 
strongly with colour gradient in high mass galaxies than in 
low mass galaxies. The most likely reason for this is that 
massive galaxies have larger bulge-to -disk ratios than less 
massive galaxies. iFabello et al.l (|201ll ) showed that the Hi 
content of a galaxy did not depend on its bulge-to-disk ra- 
tio; the Hi mass fraction only depended on the properties of 
the disk. It is thus likely that the Hi fraction correlates with 
the colour gradient of the dtsk and the bulge is a contami- 
nant when determining the colour gradient. At present, we 
do not have bulge/disk decompositions for all the galaxies 
in our sample, so we do not investigate this hypothesis in 
more depth. Another effect that may be important is that 
massive galaxies contain more dust, and this may change 
the relation between colour gradient and Hi fraction. 

We now test whether our new estimator exhibits any re- 
maining systematic biases by checking whether the residuals 
are correlated with any intrinsic galaxy property. In Figure[3] 
we plot the residuals for the CIO estimator (grey) and for the 
new estimator (red) as a function of M*, fi,, NUV — r, and 
R90/R50, We only show results for the ALFALFA sample, 
where the new estimator does change the Hi mass fraction 
predictions by a significant amount. 

Figure shows that the new estimator leads to a sig- 
nificant reduction in the large positive residuals for Hi rich 
galaxies with low masses and stellar surface mass densities. 
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Figure 3. Distributions of tlie residuals in tlie predicted Mjjj/Mt are plott ed as functions of stella r mass M*, stellar surface mass density 
^* , NUV — r colour and concentration index Rqq / R50 for the estimator of lCatinella et al] 1I2OIC1I ) (grey dots) and for the new estimator 
in equation {TJ (colourful circles). In the latter case the galaxies are divided into four stellar mass intervals and are plotted in different 
colours (cyan: logio(M*/M0) < 9; blue: 9 < logio(M«/M0) < 9.5; green: 9.5 < logio(Mt/M0) < 10; red: logio(Mt/M0) > 10). 



blue colours and low concentration indices. The new esti- 
mator does not reduce the residuals for galaxies with high 
stellar surface densities, red colours and high concentration 
indices. There is still a sub-population of such galaxies that 
are Hl-rich and where equation ^ fails to predict the Hi 
content accurately. An example of such a system is discussed 
briefly in CIO. In addition, one might worry that the Hi mass 
fraction estimation may be biased in this regime, because 
many red galaxies are not detected in both the GASS and 
ALFALFA surveys 

In the left-hand panel of Figure [2] we plot th e results of 
the Hi stacking analysis bv lFabello et al.l (201ll). By stack- 
ing samples of a few hundred galaxies, Fabello et al.l (|201ll ) 
were able to estimate mean Hi mass fractions for galaxies 
with NUV — r colours in the range 4 — 6 (shown as green 
stars on the plot). As can be seen, the CIO estimator accu- 
rately reproduces the stacked results with a la scatter less 
than 0.07 dex, even for the reddest stacks. Unfortunately, a 
similar test is not possible for our new estimator, because the 
sample of SDSS galaxies with available ALFALFA coverage 
is too small to carry out a stacking analysis using four dif- 
ferent galaxy parameters instead of two. Therefore, in what 
follows, we divide our galaxies into "red" or "blue" using a 
mass-dependent colour divider: 



{NUV - r)cut = 0.5 logio (M./M0) ■ 



(2) 



For "red" galaxies with NUV ~r > {NUV — r) cut we use the 
old estimator, while the new estimator is applied to "blue" 
galaxies with NUV - r < {NUV - r)cut- 

We now carry out a test to see whether the estimator in- 
troduces any systematic bias in galaxy clustering analyses. 
We divide all the GASS galaxies including non-detections 
into six subsamples of equal size, using both the mea- 
sured value of Mhi/M, and the predicted value and we 
compute the projected two-point cross-correlation functions 
(2PCCF), ■Wp{rp) of these subsamples with the SDSS ref- 
erence sample. We find that it is very important to take 
into account the effect of the errors in the predicted value of 
Mhi /Mt when comparing the results using the photometric 
estimator with the results using the real HI measurements. 
The effect of errors is to weaken the clustering trends as a 
function of Mhi/M^, particularly in the tails of the distri- 
bution. Here, we model the effect of the errors by adding a 
random component to the measured value of Mhi /M^ that 
follows a Gaussian distribution function with a width of 0.31 
dex. Results using the measured Hi mass fractions convolved 
with the Gaussian distribution of errors are shown in blue in 
Figure |4l while results using the photometric estimator are 
shown in red. The errors in the Wp{rp) measurements are 
computed using the bootstrap resampling technique. The 
2PCCF for the whole GASS sample is plotted as a black 
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Figure 4. Projected two-point cross-correlation function, 'Wp{rp), for six subsamples with equal number of galaxies from the GASS, 
selected by the observed Hi mass fraction (triangles connected by blue dotted lines) or the predicted value (squares connected by red 
solid lines). The average value of the Hi mass fraction of each subsample is indicated. The result for the whole sample is plotted in the 
black solid line and repeated in every panel for reference. 
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Figure 6. The projected two-point correlation function Wj,(rj,) for g alaxies in bins of stellar mass. Results for the SDSS DR7 sample 
are shown in black triangles, while those for the lFu et al.l | |2010| ) and lGuo et al. models are shown in red solid lines and in blue 

dashed lines, respectively. Errors on the SDSS results are estimated from a set of 50 mock galaxy surveys that have the same selection 
effects as the real SDSS sample. 



solid line in each panel for reference. Figure U shows that 
the two Wp{rp) calculations agree well with each other. 

Although the GASS sample is small, we can still see that 
both the amplitude and shape of the 2PCCF show strong 
systematic trends with increasing Hi mass fraction. Hl-rich 
galaxies are less strongly clustered on all scales, with more 
pronounced 1-halo to 2-halo transitions at ~ IMpc. Since 



galaxy clustering depends on a variety of physical properties, 
in particular on stellar mass, it is unclear to what extent the 
effect seen from Figure |4] is due to Hi content only. We will 
address this point in the next section. 
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Figure 5. Logarithm of the mean Hi mass fraction is plotted 
as a function of the logarithm of the stellar mass for galaxies 
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IFu et al.l 1I2OIOI . red) and lGuo et~all 1 I2OIII . blue). The la scatter 
in logj^g(Mjj7/M, ) is indicated by the grey shaded area for the 
data, and by red/blue dashed curves for the models. 



4 CLUSTERING AS A FUNCTION OF Mjj//M. 
AND COMPARISONS WITH 
SEMI-ANALYTIC MODELS 

In this section we apply our new photometric estimator to 
our full SDSS DR7 galaxy sample to study the dependence of 
clustering on Hi mass fraction. We compare our results with 
predictions from the Gil and FIO models. It is well known 
that clustering depends strongly on galaxy stellar mass, so 
the analyses are always carried out in narrow mass intervals. 
In order to take errors in the photometric estimator into 
account, we convolve the Hi mass fractions predicted by the 
models with a Gaussian distribution function of width 0.3 
dex in logio(Afo//Af,). 

Before we begin, we demonstrate that the models re- 
produce average trends in Hi mass fraction as a function 
of stellar mass reasonably well. In Figure [S] the black solid 
curve shows the median value of logj^Q Mhi/M^ as a func- 
tion of stellar mass for galaxies in the GASS survey, while 
the grey shaded region indicates the 16*'' to 84*'' percentile 
ranges of logjQ Mni/Mt. Note that the galaxies without Hi 
line detections are assigned an Hi mass equal to the upper 
limit. This is why the black curve and the shaded region do 
not fall below logio Mh//M* ~ -1.82 (see CIO for details 
on the detection limits of the survey). We now perform the 
same analysis for the simulated galaxies and the results are 
shown in red and blue for the FIG and Gil models, respec- 
tively. We see that the Gil model yields a higher median 
value of Hi gas mass fraction at a given value of M*, when 
compared to both the data and the FIO model. 

There are two reasons for this: 1) The Gil model does 



not account for the partition of the neutral gas into differ- 
ent components. The FIO model includes simple prescrip- 
tions for the formation of molecular gas and also properly 
takes into account the contribution of helium when making 
predictions for Hi content. 2) The FIO model parameters 
are explicitly adjusted so as to match the Hi mass func- 
tions deter mined by existing Hi su rveys like HIPASS and 
ALFALFA. iKauffmann et al.l (|2012l ) have shown that the 
FIO model can also reproduce the distribution of Mhi/M, 
for the population of galaxies with detectable gas, but the 
model does not provide a fully accurate description of the 
population of galaxies without detectable cold gas. We note 
that Figure [5] includes both populations, so the fit to the 
data is not as good as that shown in Figure 2 of the Kauff- 
mann et al. paper. Since the GASS is a survey only for high 
mass galaxies with log]^Q(M«/MQ) > 10, we are not able to 
extend this comparison to lower masses. We have compared 
the models to the data using our pseudo Hi mass fractions, 
and this shows that the models roughly match the data at 
lower masses as well. 

Next, Figure |S] demonstrates that both the Gil and 
FIO models reproduce the observed dependence of Wp(rp) 
on stellar mass. The agreement with observations is equally 
good for both models on large scales. The FIO model appears 
to provide a somewhat better fit to the clustering amplitude 
on scales below ~ 1 Mpc. 

In order to carry out meaningful comparisons between 
data and models, we order all the galaxies in a given stellar 
mass range by increasing Mhi/M^ and divide the galax- 
ies into 10 subsamples, each containing 10 per cent of the 
whole sample. We analyze the dependence of Wpij-p) on Hi 
mass fraction as a function of Hi mass fraction percentile in- 
stead of the absolute value of Hi mass fractionqj In order to 
provide a more intuitive feel for our results, we present our 
measurements in terms of bias factor, defined as the ratio of 
the Wp{rp) for a given Hl-selected subsample to the uipij-p) 
of all galaxies in the corresponding stellar mass range. In 
Figure [T] we plot this bias factor as a function of percentile 
in logjQ Mhi /Mf, with Hi mass fraction increasing from left 
to right. Results for different intervals in stellar mass are 
shown in different rows, while results evaluated on differ- 
ent projected scales are shown in different columns. The 
data is shown in black curves with shaded regions indicating 
the la errors that are estimated from the bias factor mea- 
surements of the 50 mock SDSS catalogues, while the FIO 
and Gil models are shown in red circles and blue triangles, 
respectively. 

As can be seen, the bias always decreases with increas- 
ing Hi mass fraction. The trend is strongest for low mass 
galaxies and on scales of around 100-200 h~^ kpc. The agree- 
ment between observations and models is good, except for 
the lowest mass galaxies. For galaxies with stellar masses 
in the range 3 x 10^ M©- lO^^M©, the observed bias factor 
drops steeply as a function of Hi mass fraction percentile, 
and then flattens. The bias factor predicted by the models 

^ We note that this only makes sense for the Gil model if the 
intrinsic scatter in Hl-to-H2 ratio in real galaxies does not change 
the ranking of Mni/ Mt with respect to [Mhi + Mu^MM,. 
ISaintonee et al.] 1 I2OIII ) show that the average value of Mui I^H^ 
is around 1/3 and the molecular gas mass very rarely exceeds the 
atomic gas mass, so this is likely to be close to correct. 
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Figure 7. Panels on the left-hand side show the clustering bias factor as a function of percentile in logjg Mui/Mt , at different separations 
(panels from left to right) and for different stellar mass ranges (panels from top to bottom). The shaded region in the left-hand panels 
indicate the errors on the observed bias factors estimated from the mock catalogues. Model results arc shown in red/blue for the FIO/GIO 
models. In the right-most column, we plot the central galaxy fraction as a function of percentile in logj^Q Mui/Mt. The black curves show 
the central fractions estimated from the data (see text). The dashed red/blue curves show the true central fractions from the models. 
The solid red/blue curves show the central fractions in the models when estimated in the same way as in the data. 



exhibits a more linear trend as a function of Hi mass fraction 
percentile. 

As we will now show, this decrease in bias can be un- 
derstood in a simple way in terms of an increasing ratio 
of central-to-satellite galaxies as a function of increasing Hi 
mass fraction. To prove that this is the case, we classify each 
galaxy in our sample as either a central galaxy or a satellite 
galaxy based on whether it is more massive than all com- 
panions within a cylinder with projected radius Rmax and 
a line-of-sight depth of ±1000 km . Here, Rmax is set to 
twice the virial radius of the host dark matter halo of the 
galaxy. We have adop t ed th e stellar mass-halo mass relation 
derived bv lGuo et ai] (|2010t ) to estimate a halo mass for the 
galaxy, and t hen estimate a 'v irial' radius of the halo using 
the model of lEke et al.l (|200ll ). In addition, we require that 
a central galaxy should not fall within Rmax of any other 
more massive galaxy. 

In the right-hand panels in Figure [3 we plot the frac- 
tion of central galaxies, fcen, as a function of Hi fraction 
percentile (the black solid line). We see that fcen increases 
with increasing Hi content, with the effect stronger at low 
stellar masses. 

We can of course check whether the models predict a 
similar increase in /cen as a function of Hi gas fraction. In 
the right column of Figure [T] the dashed curves show the 
true values of fcen as a function of Hi fraction percentile for 
the FIO (red) and Gil (blue) models. The central fractions 
in the FIO model are lower than those in the Gil model 
at low Hi mass fractions, particularly for galaxies with low 



stellar masses. This reflects the fact that gas consumption 
times in satellite galaxies are longer in the Gil model than 
in the FIO model. 

In order to make a fair comparison with observations, 
we have also computed fcen for the model galaxies in exactly 
the same way as in the observations. In brief, we project 
the model galaxies onto the x — y plane and take the z- 
axis as the line-of-sight direction (i.e. we adopt the distant 
observer approximation). We then apply exactly the same 
procedure described above to classify each galaxy as central 
or satellite. The axis peculiar velocities of the galaxies 
are added to their z-aods positions. In addition, halo masses 
and virial radii are not taken from the model catalogue, but 
are estimated exactly the same way as for the observational 
data. Results are plotted as solid red and blue curves in the 
right-hand panels of Figure [T] 

We note that the true central fractions are always 
smaller than the ones that use a classification technique 
based on whether or not brighter companions are found 
in cylinders around the galaxy. However, our classification 
technique preserves the shape of the relation between cen- 
tral fraction and Hi gas fraction percentile, as well as the dif- 
ferences between the FIO and Gil models. We also see that 
the central fractions estimated in cylinders in the simulation 
agree reasonably well with the data. As was the case for the 
bias factor, the behaviour of the central fraction as function 
of Hi mass fraction percentile in the models is somewhat 
different to what is seen in the observations, particularly at 
low stellar masses. 
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In summary, the general agreement with the models 
supports our conjecture that the trends in bias factor as 
a function of Hi mass fraction mainly arise as a result of 
trends in the satellite-to-central ratio. 



5 SCALE DEPENDENCE OF THE BIAS FOR 
GALAXIES WITH EXCESS/DEFICIENT HI 
CONTENT 

In the previous section, we studied how the bias factor 
changes as a function of normalized Hi mass fraction for 
galaxies in different stellar mass bins. The results presented 
in Figure [7] clearly show that the change in the bias fac- 
tor between "gas-poor" and "gas-rich" galaxies depends on 
scale rp. 

In this section, we analyze the scale-dependence of the 
change in bias factor for both gas-deficien t and gas-rich 
galaxies. We note that iHavnes fc Giovanellil (|l984h defined 
gas-deficiency to be the difference in Hi content between 
cluster galaxies and "field" galaxies of the same morpho- 
logical type and size. There are some difficulties with this 
definition, including the definition of "field" and the fact 
that Hubble classification is problematic in rich clusters. 
Some more recent analyses have used a type-independent 
deficiency paramet er that compares all galaxies to a fixed 
Hi surface density (|Chung et al.|[2009l ). One worry with this, 
is that the mean Hi content of galaxies scales strongly with 
galaxy parameters su ch as stellar mass and surface density 
jCatinella erallboiol ). 

In this analysis, we will adopt a flexible approach to 
defining "pseudo" Hi deficiency parameters. We will ana- 
lyze the photometrically-predicted Hi content with respect 
to galaxies of the same stellar mass, with respect to galaxies 
of the same mass and size, and with respect to galaxies of 
the same mass, size and colour (as in lCortese et al.ll201ll '). 

In the upper panels of Figure [8] we show the change 
in bias factor between the 10*'' and 50*'' percentile bins in 
logj^o Mhi /Mt . This serves as a test of gas-stripping mech- 
anisms in gas-deficient galaxies. The lower panels show the 
change in bias factor between the 100*'' and 50*'' percentile 
bins in logj^g Mhi /M^ ■ This serves as a test of gas accretion 
mechanisms in gas-rich galaxies. 

Results for the SDSS DR7 galaxies are shown as black 
curves in Figure [S] Grey shaded regions indicate the la 
errors on our estimates, obtained from the 50 mock SDSS 
catalogues. Results for the FIO and GIG models are shown 
as red and blue curves and we plot our results in three dif- 
ferent stellar mass ranges. As can be seen from the plot, 
the change in bias factor between gas-deficient galaxies and 
galaxies with typical gas fractions is most pronounced for 
low stellar mass systems. The bias factor difference peaks at 
relatively small physical scales (~ 100 — 300 kpc). For the 
most massive galaxies with 10.5 < logjQ(M,/M0) < 11 , 
there is little change in bias on any scale. For galaxies with 
9.5 < logj^Q(M*/M0) < 10, the increase in clustering am- 
plitude from galaxies with typical gas mass fractions to the 
most gas-deficient objects reaches a factor of 2 on scales of 
a few hundred kpc. On scales larger than 2 — 3 Mpc, there 
is no significant change in clustering amplitude. The results 
are consistent with the idea that gas quenching is driven by 
processes that are internal to dark matter halos. The mod- 



els agree well with the data at stellar masses greater than 
10^" M0, but at lower stellar masses the models predict a 
weaker bias for gas-deficient galaxies than is actually seen. 

As seen in the bottom panels of Figure [51 the change 
in bias factor between very gas-rich galaxies and galaxies 
with typical gas fractions appears to be weaker rather than 
stronger at low stellar masses. The most gas-rich galaxies 
with stellar masses greater than IO^^A/q are more weakly 
clustered than galaxies with typical gas fractions, indicating 
that they occupy lower mass dark matter halos on average. 
At stellar masses below 10^°Mo, there is no anti-bias of 
gas-rich galaxies seen in the data. However, the models do 
predict clear anti-bias effects. 

One might question whether ranking galaxies by Hi 
mass fraction is sufficient to characterize whether a galaxy 
is classified as gas-rich or gas-deficient. As discussed in § [3l 
galaxies of fixed stellar mass and colour have higher Hi mass 
fractions if they have larger sizes (i.e. lower stellar surface 
mass densities). One way to understand t his is to appeal to 
standard disk f ormation models (e.g. FIO: lKauffmannlll996l : 
IMo et al]|l998i V In these models, the spin parameter of the 
dark matter halo determines the contraction factor of the in- 
falling gas. Larger disks in a dark matter halo of fixed mass 
will have higher Hi mass fractions because gas surface den- 
sities are low and gas consumption times are long. In this 
case, it would make more sense to define galaxies as gas- 
rich or gas-deficient by comparing their Hi mass fractions to 
other galaxies of the same mass and size. 

One might also consider an even more stringent con- 
straint that Hl-rich/Hl-deficient galaxies be classified as 
those objects with higher/lower-than-average Hi content 
given their stellar mass, size and star formation rate. This 
might indicate that the galaxy has experienced a recent gas 
accretion episode and that the global star formation has not 
yet had a chance to respond to the extra fuel supply. In our 
scheme of using photometric quantities to predict Hi con- 
tent, the Hl-rich systems would correspond to those galax- 
ies with bluer-than-average outer disks. Recall that the Hi 
content in gas-poor regime is currently calibrated using only 
stellar surface density and colour; we therefor do not delve 
into the opposite regime, where gas has been recently re- 
moved from a galaxy. 

In Figures [S] and [in] we investigate clustering trends 
using these alternative definitions. For Figure [9l we rank 
galaxies as a function of their deviation from the average 
Hi mass fraction of all galaxies of the same stellar mass 
(Mt) and stellar surface mass density (/i.). As seen from 
equation ([l]), this deviation depends on both the NUV — r 
colour of the galaxy and its g — i colour gradient. For Figure 
IIOI we rank galaxies as a function of their deviation from 
the average Hi mass fraction of all galaxies of the same M, , 
and NUV — r. This then depends only on the g — i colour 
gradient of the galaxy. 

Interestingly, the top panels of Figure [9] show that 
when gas deficiency is expressed relative to galaxies of the 
same stellar mass and size, the change in bias on scales be- 
tween a few hundred kpc and 1 Mpc becomes much more 
pronounced. The change in bias factor for the lowest mass 
galaxies now reaches values near ~ 3 and even massive gas- 
deficient gala:xies are now significantly biased with respect 
to their counterparts with "normal" gas fractions. 

The FIO model provides predictions of the radial pro- 
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Figure 8. In the upper panels, we plot the change in bias factor from the lO"* percentile to the 50*'' percentile in logiQ(M£f//Af») as a 
function of projected physical scale, for different stellar mass intervals as indicated. The black line shows the result from the SDSS/DR7. 
The errors esti mated from the mock catalogues are sh own as shaded regions. The red dotted-dashod line and the blue solid lino show 
results from the lFu et al. and IGuo et ah I hi \ il l models after convolution with errors. The lower panels show the change in bias 
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Figure 9. Same as the previous figure, except that the galaxies are ordered by the deviation in logj^Q{M[ji/Mt,) from the value predicted 
from the mean relation betwee n login(Mg//M «) and galaxy mass M* and stellar surface mass density fit. Red dotted-dashed curves 
show results from the model of lFu et al.l | |201C1) after convolution with errors. 
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files of the gas and the stars in galaxies. It is thus possible to 
look at gas deficiency at fixed mass and stellar surface den- 
sity in the model. Results are plotted as red curves in Fig- 
ure|9l We see that the model disagrees very strongly with the 
observations. In the model, bias effects become weaker rather 
than stronger when gas deficiency is defined with respect to 
galaxies of the same mass and size. These results would ap- 
pear to suggest that in the real Universe, gas removal pro- 
cesses depend on the size/density of the galaxy itself. This 
is not the case in the models, where satellite galaxies be- 
come gas-poor only because their supply of infalling gas has 
been cut off. Thus the data suggest that processes such as 
ram-pressure stripping, which depend on the density of the 
interstellar medium (ISM), may play an important part in 
explaining the observed trends. 

In contrast to what is seen for gas-deficient galaxies, 
if gas-richness is normalized with respect to galaxies of the 
same stellar mass and size, the bias trends remain much the 
same and the FIO model predictions still fit reasonably well 
for galaxies more massive than 10^° Mq. This suggests that 
gas accretion processes are being modelled quite successfully 
at high stellar masses. 

Figure llOl shows that the bias effects for gas-rich galax- 
ies are still roughly of the same strength as in the previous 
figure, when gas-richness is expressed relative to galaxies of 
the same mass, size and global NUV — r colour. This means 
that the clustering does depend strongly on the colour- 
gradient term for these objects. Galaxies with bluer-than- 
average outer colours are clearly located in lower-density 
environments compared to galaxies where there is no ev- 
idence for younger-than-average stellar populations in the 
outer disk. 



6 SUMMARY AND DISCUSSION 

We introduce a new photometric estimator for estimating 
the Hi mass fraction [Mhi /M*) in local galaxies. The es- 
timator is calibrated with a sample of 293 galaxies with 
M. > 1Q^°Mq in the redshift range 0.025 < z < 0.05, which 
are detected in Hi emission line by the GASS survey. The 
estimator is a linear combination of four parameters: stellar 
mass M, , stellar surface mass density /i, , near-UV-to-optical 
colour NUV — r, and the gradient in g — i colour Ag_i. We 



demonstrate that this estimator provides unbiased Mni/Mt 
estimates for Hl-rich galaxies. 

We then apply this estimator to a sample of --^24,000 
galaxies from the SDSS/DR7 that lie in the same redshift 
range. We analyze the clustering of these galaxies as a func- 
tion of stellar mass and as a function of Hi mass frac- 
tion Mni/Mt and we compare the results with predictions 
fro m two re c ent s emi-a nalytic mo d els of galaxy formation 
by IFu et all (|2010l ') and IGuo et"al] (|201ll '). Our results may 
be summarized as follows: 

• Clustering depends strongly on Hi mass fraction at fixed 
Af» . Galaxies with large values of Mhi /M, are more weakly 
clustered. The total change with Hi mass fraction in clus- 
tering strength is largest for low mass galaxies, 

• At fixed Mt , the clustering dependence on Hi mass frac- 
tion is strongest on scales of a few hundred kpc. On large 
scales (> IMpc), clustering depends weakly on Hi mass frac- 
tion. This suggests the Hi content of a galaxy of fixed stellar 
mass depends on location within its dark matter halo. 

• After the uncertainty in the Hi mass fraction estimator 
is taken into account, the observed dependence of clustering 
on Hi mass fraction is well reproduced by the models for 
galaxies more massive than 10^" Af©. Significant discrepan- 
cies remain at lower stellar masses. 

In the next part of the paper, we extend the analysis by 
studying the clustering of Hl-deficient and Hl-rich galaxies 
defined in two ways: 1) with respect to the average Hi con- 
tent of galaxies of the same stellar mass and size, 2) with 
respect to the average Hi content of galaxies of the same 
stellar mass, size and NUV — r colour. These definitions are 
motivated by the following considerations. First, models in 
which disks form from gas that cools and condenses in dark 
matter halos, while conserving angular momentum, predict 
that the gas fractions of equilibrium disks depend on both 
their mass and their size. Second, the majority of nearby 
spiral galaxies are observed to lie on a relatively tight plane 
linking Hi gas mass fraction with stellar mass, stellar sur- 
face density and NUV — r colour (|Catinella et al.l|2010h . It 
is natural to suppose that galaxies that have undergone a 
recent gas accretion event would be displaced to higher Hi 
mass fractions with respect to this plane. Conversely, galax- 
ies that have been stripped of their gas would be displaced 
to lower values of Mhi /Af* . 
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The main results of our analysis of Hl-rich and Hl- 
deficient galaxies can be summarized as follows. 

• When Hi deficiency is defined with respect to galaxies 
of the same stellar mass and size, the bias of Hl-deficient 
galaxies relative to normal galaxies is larger than obtained 
if the Hi deficiency is defined with respect to galaxies of 
the same mass. The s ame effec t is no t reproduced in the 
semi-analytic model of IFu et al.l (|2010t ). 

• When Hl-richness is expressed with respect to galaxies 
of the same mass and size (as well as with respect to galaxies 
of the same mass, size and colour), Hl-rich galaxies more 
massive than 1O^°M0 are observed to be anti-biased with 
respect to their counterparts with normal Hi content. The 
same is not true at lower stellar masses. 

We have proposed that the disagreement between the 
observations and the models might be resolved, if processes 
such as ram-pressure stripping, which depend on the density 
of the ISM, are included in the models. We note that the 
lowest mass galaxies have the lowest densities and are thus 
the most likely to be affected by ram-pressure. In order to 
test this hypothesis in more detail, we plan to look behaviour 
of gas deficiency as a function of cluster-centric radius in 
samples of nearby groups and clusters. 

We also stress that next generation wide-field Hi sur- 
veys such as the ASKAP Hi All-sky Survey (WALLABY) 
and surveys carried out by the Apertif receiver array on 
the Westerbork Synthesis Radio Telescope will measure Hi 
masses and sizes for samples of tens to hundreds of thou- 
sands of galaxies at redshifts of around 0.1. These surveys 
will make it possible to investigate clustering as a function 
of the true Hi content of a galaxy. It will be interesting to 
investigate the degree to which the results conform with our 
current analysis of "pseudo" Hi content. Discrepancies may 
reveal additional physics that we have not yet considered. 
In the meantime, the construction of models that can re- 
produce the gas properties of galaxies as well as possible, is 
an important step towards building mock surveys that can 
be safely used for making predictions in support of these 
surveys. 
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